Using Factual Density to Measure Informativeness of Web Documents
نویسندگان
چکیده
The information obtained from the Web is increasingly important for decision making and for our everyday tasks. Due to the growth of uncertified sources, blogosphere, comments in the social media and automatically generated texts, the need to measure the quality of text information found on the Internet is becoming of crucial importance. It has been suggested that factual density can be used to measure the informativeness of text documents. However, this was only shown on very specific texts such as Wikipedia articles. In this work we move to the sphere of the arbitrary Internet texts and show that factual density is applicable to measure the informativeness of textual contents of arbitrary Web documents. For this, we compiled a human-annotated reference corpus to be used as ground truth data to measure the adequacy of automatic prediction of informativeness of documents. Our corpus consists of 50 documents randomly selected from the Web, which were ranked by 13 human annotators using the MaxDiff technique. Then we ranked the same documents automatically using ExtrHech, an open information extraction system. The two rankings correlate, with Spearman’s coefficient ρ = 0.41 at significance level of 99.64%.
منابع مشابه
The Appropriateness of the Factual Density
In circumstances where the receptivity of the online news is affected by the media bias in covering public attention events, the quality of the textual component is of pervasive importance for a reliable perception of their informativeness. Aware of this threat, several natural language processing techniques have been developed for the purpose of capturing the quality of the web content based o...
متن کاملAssessing the Quality of Online News Articles as References for an Encyclopaedia Entry
The quality of online news articles is decisive both for a reliable perception of their informativeness and for including them as a reference when creating an encyclopaedia entry for a public attention event. Tackling the enormous volume, variety and complexity of different articles disseminated online, several natural language processing techniques have been developed for the purpose of captur...
متن کاملInformativeness for Adhoc IR Evaluation: A Measure that Prevents Assessing Individual Documents
Informativeness measures have been used in interactive information retrieval and automatic summarization evaluation. Indeed, as opposed to adhoc retrieval, these two tasks cannot rely on the Cranfield evaluation paradigm in which retrieved documents are compared to static query relevance document lists. In this paper, we explore the use of informativeness measures to evaluate adhoc task. The ad...
متن کاملInformation density, Heaps' Law, and perception of factiness in news
Seeking information online can be an exercise in time wasted wading through repetitive, verbose text with little actual content. Some documents are more densely populated with factoids (fact-like claims) than others. The densest documents are potentially the most efficient use of time, likely to include the most information. Thus some measure of “factiness” might be useful to readers. Based on ...
متن کاملRRLUFF: Ranking function based on Reinforcement Learning using User Feedback and Web Document Features
Principal aim of a search engine is to provide the sorted results according to user’s requirements. To achieve this aim, it employs ranking methods to rank the web documents based on their significance and relevance to user query. The novelty of this paper is to provide user feedback-based ranking algorithm using reinforcement learning. The proposed algorithm is called RRLUFF, in which the rank...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013